-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/metagenomics exchange #325
Conversation
554e0b0
to
355bee0
Compare
- Refactored the cli wrapper - Refactored the settings to be in line with the rest - Modified the unit tests
The test tests/me/test_populate_metagenomics_exchange.py::TestMeAPI::test_removals_dry_mode it's working now.
c0a15d7
to
a6e9eda
Compare
…nomics/emgapi into feature/metagenomics_exchange
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff Kate, I left some comments :)
config/local-tests.yml
Outdated
# metagenomics exchange | ||
me_api: 'https://wwwdev.ebi.ac.uk/ena/registry/metagenome/api' | ||
me_api_token: 'mgx 871cd915-2826-46bb-94ed-8e6a3d9b6014' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to remove this token and ask Suran to grant us a new one, this has write access and this repo is public (even if it's dev env). This one should be in the secrets config file
logging.info(f"Analysis {ajob} updated successfully") | ||
# Just to be safe, update the MGX accession | ||
ajob.mgx_accession = registry_id | ||
ajob.last_mgx_indexed = timezone.now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one should also use the "nowish" timestamp: "timezone.now() + timedelta(minutes=1)"
emgapi/metagenomics_exchange.py
Outdated
return response | ||
|
||
def add_analysis(self, mgya: str, run_accession: str, public: bool): | ||
data = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "data" object is repeated in the command:
def generate_metadata(self, mgya, run_accession, status):
return {
"confidence": "full",
"endPoint": f"https://www.ebi.ac.uk/metagenomics/analyses/{mgya}",
"method": ["other_metadata"],
"sourceID": mgya,
"sequenceID": run_accession,
"status": status,
"brokerID": settings.METAGENOMICS_EXCHANGE_MGNIFY_BROKER,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep one copy of this, probably here
emgapi/models.py
Outdated
@@ -321,20 +331,65 @@ def to_add(self): | |||
|
|||
return self.filter(never_indexed | updated_after_indexing, not_suppressed, not_private) | |||
|
|||
def get_suppressed(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding this method, it's better to override "to_delete" with your implementation. This way we stick to the "interface" that IndexableModelQueryset provides
class MetagenomicsExchangeQueryset(IndexableModelQueryset):
index_field = "last_mgx_indexed"
def to_delete(self):
try:
self.model._meta.get_field("suppressed_at")
except FieldDoesNotExist:
return Q()
else:
return self.filter(
Q(suppressed_at__gte=F(self.index_field))
)
emgapi/models.py
Outdated
@@ -1634,6 +1687,70 @@ def __str__(self): | |||
return 'Assembly:{} - Sample:{}'.format(self.assembly, self.sample) | |||
|
|||
|
|||
class MetagenomicsExchangeQueryset(models.QuerySet): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @KateSakharova , great stuff! Couple of small suggestions and queries inline.
emgapi/models.py
Outdated
class MetagenomicsExchangeModel(models.Model): | ||
"""Model to track Metagenomics Exchange population | ||
https://www.ebi.ac.uk/ena/registry/metagenome/api/ | ||
""" | ||
last_updated_me = models.DateTimeField( | ||
db_column='LAST_UPDATED_ME', | ||
auto_now=True | ||
) | ||
last_populated_me = models.DateTimeField( | ||
db_column='LAST_POPULATED_ME', | ||
null=True, | ||
blank=True, | ||
help_text="Date at which this model was last appeared in Metagenomics Exchange" | ||
) | ||
|
||
objects_for_population = MetagenomicsExchangeQueryset.as_manager() | ||
|
||
class Meta: | ||
abstract = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this abstract model for? Isn't last_populated_me
now done by last_mgx_indexed
in the MetagenomicsExchangeIndexedModel
? Maybe I misunderstood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch 🤦, it was left after ebi_search merge by mistake, removed
emgcli/settings.py
Outdated
try: | ||
METAGENOMICS_EXCHANGE_API = EMG_CONF['emg']['me_api'] | ||
if EMG_CONF['emg'].get('me_api_token'): | ||
METAGENOMICS_EXCHANGE_API_TOKEN = EMG_CONF['emg']['me_api_token'] | ||
else: | ||
METAGENOMICS_EXCHANGE_API_TOKEN = os.getenv('METAGENOMICS_EXCHANGE_API_TOKEN') | ||
except KeyError: | ||
warnings.warn("The metagenomics exchange API and Token are not configured properly") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try: | |
METAGENOMICS_EXCHANGE_API = EMG_CONF['emg']['me_api'] | |
if EMG_CONF['emg'].get('me_api_token'): | |
METAGENOMICS_EXCHANGE_API_TOKEN = EMG_CONF['emg']['me_api_token'] | |
else: | |
METAGENOMICS_EXCHANGE_API_TOKEN = os.getenv('METAGENOMICS_EXCHANGE_API_TOKEN') | |
except KeyError: | |
warnings.warn("The metagenomics exchange API and Token are not configured properly") | |
try: | |
METAGENOMICS_EXCHANGE_API = EMG_CONF['emg']['me_api'] | |
METAGENOMICS_EXCHANGE_API_TOKEN = os.getenv('METAGENOMICS_EXCHANGE_API_TOKEN') | |
if not METAGENOMICS_EXCHANGE_API_TOKEN: | |
METAGENOMICS_EXCHANGE_API_TOKEN = EMG_CONF['emg']['me_api_token'] | |
except KeyError: | |
warnings.warn("The metagenomics exchange API and Token are not configured properly") |
Just to make sure that a missing token (not in env or in config) triggers the warning.
emgapi/metagenomics_exchange.py
Outdated
"""Metagenomics Exchange API Client""" | ||
|
||
def __init__(self, base_url=None): | ||
self.base_url = base_url if base_url else settings.METAGENOMICS_EXCHANGE_API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.base_url = base_url if base_url else settings.METAGENOMICS_EXCHANGE_API | |
self.base_url = base_url or settings.METAGENOMICS_EXCHANGE_API |
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
|
||
# Copyright 2017-2022 EMBL - European Bioinformatics Institute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright 2017-2022 EMBL - European Bioinformatics Institute | |
# Copyright 2017-2024 EMBL - European Bioinformatics Institute |
emgapi/metagenomics_exchange.py
Outdated
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
|
||
# Copyright 2018-2023 EMBL - European Bioinformatics Institute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright 2018-2023 EMBL - European Bioinformatics Institute | |
# Copyright 2018-2024 EMBL - European Bioinformatics Institute |
def process_to_index_and_update_records(self, analyses_to_index_and_update): | ||
logging.info(f"Indexing {len(analyses_to_index_and_update)} new analyses") | ||
|
||
for page in Paginator(analyses_to_index_and_update, 100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 100
is probably worth making a parameter – we had some trouble with EBI dump where reasonable sounding page sizes still caused query problem on the DB. If parametrized we can at least easily try different values.
logging.info(f"Analysis {ajob} updated successfully") | ||
# Just to be safe, update the MGX accession | ||
ajob.mgx_accession = registry_id | ||
ajob.last_mgx_indexed = timezone.now() + timedelta(minutes=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to consider here, is that if we offset individual jobs indexation by only 1 minute individually, by the time this loop exists and the bulk_update on L154 happens – that +minute may be in the past. This means that the auto last_update will now be after the last indexation (if Django decided to update the last_update for bulk updates... which I think it currently doesn't, but might in future). Perhaps worth a quick test / check the Django docs to be sure this won't be a problem.
…ndexed This is needed because RenameField doesn't allow you to change the db_column name. With this change it seems to work, and the data should be kept: ```sql -- -- Alter field last_indexed on analysisjob -- ALTER TABLE "ANALYSIS_JOB" RENAME COLUMN "LAST_INDEXED" TO "LAST_EBI_SEARCH_INDEXED"; -- -- Alter field last_indexed on study -- ALTER TABLE "STUDY" RENAME COLUMN "LAST_INDEXED" TO "LAST_EBI_SEARCH_INDEXED"; -- -- Rename field last_indexed on analysisjob to last_ebi_search_indexed -- -- -- Rename field last_indexed on study to last_ebi_search_indexed -- ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @KateSakharova !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kate!
Scripts to populate and track Metagenomics Exchange https://www.ebi.ac.uk/ena/registry/metagenome/api/
ME is based on EBI Search model.
System for ME works a bit differently. For update process we do not need to remove and then add a record to "XML" again. That is why I do not use
to_delete()
function.I've created
get_suppressed()
to filter records that need to be removed from ME API.